In order to test for any differences over multiple time points, once can use a design including the time factor, and then test using the likelihood ratio test (LRT). Here, as we have control (DMSO) and treatment (Decitabine) time series, design formula containing the condition factor, the time factor, and the interaction of the two. In this case, using the likelihood ratio test with a reduced model which does not contain the interaction terms will test whether the condition induces a change in gene expression at any time point after the reference level time point (time 0).
(see DESeq2 Time-series-experiments for more details)
I'm mapping hg38/gencode.v34 to the fastq files using salmon.
Raw samples:
ls ../hl60-fastq/*fastq.gz
%%R
# meta
treats <- rep(c(rep('DMSO',2), rep('treated',2)),3)
reps <- rep(c('rep1','rep2'),6)
hours <- c(rep('120h',4),rep('6h',4),rep('72h',4))
colData <- data.frame(
time=hours,
cond=treats,
sample_id=paste(hours, treats, reps, sep='_'),
row.names=colnames(txi$abundance))
colData
Initial principal component analysis (PCA) shows the second treated biological replicate at 72h time point, behaves as an outlier. Removing that from the analysis give us a better representation of our dataset. In the second PCA plot, we can see that treated samples at 6h cluster with the non-treated samples which suggest that 6 hours treatment with the drug is not as effective as 72h and 120h. Although, we will check the variant genes in this time-point in the following statistical analysis.
%%R
dds.pca <- DESeq(dds0, parallel=TRUE)
# results
vsd <- varianceStabilizingTransformation(dds.pca)
pca = plot_PCA(vsd, colData)
pca
%%R
p / (p.U + p.T)
I'm doing differential expression analysis at 6h,72h and 120h plus time-independent comparison. Volcano-plot for all four tests included. Notably, I'm replacing results for time dependent tests with missing values or less than 0.05 adjust p-values with results from time independent test.
In addition, I use time as continues variable to run seprate analysis.
Time as factor variable
%%R
resultsNames(dds)
%%R
vol = vol.6h / vol.72h / vol.120h
vol
Cluster high variant genes over time.
%%R
library(pheatmap)
# mostVar Calculate the top n most variable genes in a matrix of gene expression data
# https://rdrr.io/github/abc-igmm/transcripTools/man/mostVar.html
mostVar <- function(data, n, i_want_most_var = FALSE) {
data.var <- apply(data, 1, stats::var)
data[order(data.var, decreasing = i_want_most_var)[1:n],]
}
fc <- df[!duplicated(df$gene_name),c('gene_name','log2FC_6h','log2FC_72h','log2FC_120h')] %>% remove_rownames %>% column_to_rownames('gene_name')
# scale - Z-Score
fc <- data.frame(apply(fc,2,scale, center=TRUE, scale=TRUE), row.names=rownames(fc))
# filter most variable genes
fc = mostVar(fc,30)
# Plot heatmap
h1 <- pheatmap(fc,cluster_cols = FALSE,angle_col=45)
h1